Longest Common Prefixes with k-Mismatches & Applications
نویسندگان
چکیده
We propose a new algorithm for computing the longest prefix of each suffix of a given string of length n over a constant-sized alphabet of size σ that occurs elsewhere in the string with Hamming distance at most k. Specifically, we show that the proposed algorithm requires time O(n(σR) log log n(log k + log logn)) on average, where R = d(k + 2)(logσ n + 1)e, and space O(n). This improves upon the state-of-theart average-case time complexity for the case when k = 1 [Manzini, SPIRE 2015] by a factor of logn/ log logn. In addition, we show how the proposed technique can be adapted and applied in order to compute the longest previous factors under the Hamming distance model within the same complexities. In terms of real-world applications, we show that our technique can be directly applied to the problem of genome mappability.
منابع مشابه
Longest Common Substring with Approximately k Mismatches
In the longest common substring problem we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimeister and Morgenstern introduced the problem of the...
متن کاملLongest common substrings with k mismatches
The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤ k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the length of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics ...
متن کاملar X iv : 1 40 9 . 16 94 v 2 [ cs . D S ] 1 6 M ar 2 01 5 Longest common substrings with k mismatches
The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤ k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the lengths of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics...
متن کاملQuantifying the Pitfalls of Traceroute in AS Connectivity Inference
Although traceroute has the potential to discover AS links that are invisible to existing BGP monitors, it is well known that the common approach for mapping router IP address to AS number (IP2AS) based on the longest prefix matching is highly error-prone. In this paper we conduct a systematic investigation into the potential errors of the IP2AS mapping for AS topology inference. In comparing t...
متن کاملComputing the Longest Common Prefix of a Context-free Language in Polynomial Time
We present two structural results concerning longest common prefixes of non-empty languages. First, we show that the longest common prefix of the language generated by a context-free grammar of size N equals the longest common prefix of the same grammar where the heights of the derivation trees are bounded by 4N . Second, we show that each non-empty language L has a representative subset of at ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017